Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses Content-Type mismatch on HTTP 301 Moved Permanently with FileGetContents #709

Merged
merged 4 commits into from
May 27, 2024

Conversation

DannyvdSluijs
Copy link
Collaborator

This PR attempts to resolve issue #694, in order to do so three commits where added

  • The first adds a failing test showing the issue
  • The second refactors the usage of $http_response_header [link] with get_headers() [link]
  • The third commit reverse the headers before parsing to match on the latest Content-Type header.

@mxr576
Copy link

mxr576 commented Feb 7, 2024

Based on our extensive test suite, I can confirm that this patch solves the reported issue 👍 Although it also exposes the performance bottleneck that comes with external references when they have to be fetched synchronously in a single-threaded fashion. 👎

Without patch with 3.0.0-without-%24id.json

$ ./vendor/bin/phpunit --filter Async
PHPUnit 9.6.16 by Sebastian Bergmann and contributors.

Runtime:       PHP 8.1.21

EEE..............SS..........SS                                   31 / 31 (100%)

Time: 00:01.500, Memory: 16.00 MB

With patch with 3.0.0.json.

$ ./vendor/bin/phpunit --filter Async
PHPUnit 9.6.16 by Sebastian Bergmann and contributors.

Runtime:       PHP 8.1.21

.................SS..........SS                                   31 / 31 (100%)

Time: 00:48.817, Memory: 16.00 MB

So it is good that the problem gets fixed but for us, 💯 but it is better if we stick with https://github.com/asyncapi/spec-json-schemas/blob/master/schemas/3.0.0-without-%24id.json instead of https://github.com/asyncapi/spec-json-schemas/blob/master/schemas/3.0.0.json.

@DannyvdSluijs
Copy link
Collaborator Author

So if I understand your comments correctly everything works but in your test suite you need to shift to another specification due to time constraints?

I've tried to understand the phpunit outputs and I can see the patch adds a 47 seconds time increase, but that can also be becuase of the different schema you're loading. WHat would be the Time of running with patch and 3.0.0-without-%24id.json and running without patch and 3.0.0.json as I expect the schema to be the cause and not the patch itself. At least I can think of a reason why the patch would introduce a 47 second delay. But I'm open to input in order to understand better.

In addition I've played around with the schema that is being fetched in the added test but there seems to be no impact on the timing of the test itself no matter which schema I pick.

@mxr576
Copy link

mxr576 commented Feb 27, 2024

I also believe the different schema causes/caused the delay, precisely that external references have to be resolved in the https://github.com/asyncapi/spec-json-schemas/blob/master/schemas/3.0.0.json spec synchrinously, in a blocking fashion. I have not ran a profiling on the code, but this is my strong feeling.

From this PR point of view, even if this is true, this works as designed because fixing this problem would require some refactoring - I assume. Also, admittedly, my test suite is stress testing this code because it parses several API specs with multiple data providers.

Copy link

@dafeder dafeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the performance problem being flagged in the comments, but it seems to me it is a separate issue and should not block this. I would recommend re-thinking the tests for this though.

IMHO this is enough of an edge case to not block 6.0 release.

{
$res = new FileGetContents();

$res->retrieve('http://asyncapi.com/definitions/2.0.0/asyncapi.json');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is, I think, to be avoided. We don't have control over that website or server configuration and it could easily break. Can we perform the same test by mocking a response and not requiring a real external network call? Tests should ideally run without an internet connection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with your point to some extend. In order to test we actually need a schema which is available using an HTTP 301 redirect. My point for using this specific URL is because it is relative safe as it belongs to an organisation and we point to their definition which would not change quickly.

// See http://php.net/manual/en/reserved.variables.httpresponseheader.php for more info.
$this->fetchContentType($http_response_header); // @codeCoverageIgnore
} else { // @codeCoverageIgnore
$headers = strpos($uri, 'http') === 0 ? get_headers($uri) : array();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems bad to me to switch to get_headers as, as far as I understand, this does a second HTTP request just to fetch the headers. Whereas $http_response_header is populated by file_get_contents in a single HTTP request.

So I'm not exactly sure what this fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok reading the issue again, I see what needs solving is fetchContentType needs to stop doing the early return. You need to iterate through all the headers and get the last content-type header's value, as that will be the final response. See https://github.com/composer/composer/blob/main/src/Composer/Util/RemoteFilesystem.php#L173-L177 for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your points and only realise now that indeed a secondary call is made. This triggered me to look further into the issue and found an override for file_get_contest was made in the test causing the $http_response_headers to not be set in the tests. Further looking the test that depended on the override were having different behaviour between the test and runtime as file_get_contents can return false without invoking the defined error handler. Removing the test and the override seemed like to correct way forward.

DannyvdSluijs and others added 3 commits May 26, 2024 21:17
Overriding file_get_contents introduces different behaviour from the native function, such as the $http_response_headers missing. Removing the override revealed two test (testing a specific return path of the file_get_contents method rather then testing the behaviour or result of the subject under test) which had different behaviour between the test and runtime, therefor these tests have been removed.
…g matches on HTTP 301 redirect headers which are listed first
@DannyvdSluijs
Copy link
Collaborator Author

I don't really understand the performance problem being flagged in the comments, but it seems to me it is a separate issue and should not block this. I would recommend re-thinking the tests for this though.

IMHO this is enough of an edge case to not block 6.0 release.

This can also be the result of using the get_headers which was part of the original PR. After Jordi's comments I've reviewed and change the PR back to the original and only fixed the issue and no longer alter the logic to get the response headers.

Copy link
Contributor

@Seldaek Seldaek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look at tests but the fix itself lgtm

@DannyvdSluijs DannyvdSluijs merged commit 47708f5 into jsonrainbow:master May 27, 2024
15 checks passed
@DannyvdSluijs DannyvdSluijs deleted the Issue-694 branch May 27, 2024 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants